Exploring factors influencing the severity of pregnancy anemia in India: a study using proportional odds model

Pregnancy-associated anemia is a significant health issue that poses negative consequences for both the mother and the developing fetus. This study explores the triggering factors of anemia among pregnant females in India, utilizing data from the Demographic and Health Survey 2019–21. Chi-squared and gamma tests were conducted to find out the relationship between anemia and various socioeconomic and sociodemographic elements. Furthermore, ordinal logistic regression and multinomial logistic regression were used to gain deeper insight into the factors that affect anemia among pregnant women in India. According to these findings, anemia affects about 50% of pregnant women in India. Anemia is significantly associated with various factors such as geographical location, level of education, and wealth index. The results of our study indicate that enhancing education and socioeconomic status may serve as viable approaches for mitigating the prevalence of anemia disease developed in pregnant females in India. Employing both Ordinal and Multinominal logistic regression provides a more comprehensive understanding of the risk factors associated with anemia, enabling the development of targeted interventions to prevent and manage this health condition. This paper aims to enhance the efficacy of anemia prevention and management strategies for pregnant women in India by offering an in-depth understanding of the causative factors of anemia.

Anemia is a global health issue affecting pregnant women and is a leading cause of maternal morbidity and mortality 1,2 .Anemia is characterized by a hemoglobin (Hb) level lower than 11 g/dL 3 .In India, pregnancy-associated anemia is a significant public health concern 4,5 .Pregnancy-associated anemia can be caused by inadequate dietary intake, insufficient iron consumption, or pre-existing conditions [6][7][8] .In India, an alarming proportion of pregnant women suffer from anemia, which makes it a major public health concern 4,5 .The Indian government has taken various measures to prevent anemia in pregnant women.Examples include distributing iron and folic acid supplements and conducting nutrition education programs 9,10 .Despite the preventive measures mentioned earlier, anemia remains a common problem among pregnant females in India.It is therefore necessary to investigate the factors that contribute to anemia in India.
The Demographic and Health Surveys (DHS) program 11 collects data on population, health, and nutrition in more than 90 nations, including India.DHS also provides data on a range of health indicators, such as anemia, tuberculosis (TB), and high blood pressure (BP).Processing the DHS data can help determine the factors correlated with anemia within this particular demographic.This study aims to examine two points regarding anemia through processing the DHS dataset-firstly, to analyze the prevalence of anemia among pregnant women in India and secondly, to discover the factors that are associated with it.In order to assess the correlation between anemia and various factors such as geographic region, place of residence, education level, the origin of drinking water, socioeconomic status, religion, iodine intake, toilet facility, body mass index (BMI), and age, we employ the Chi-square test and gamma test.Furthermore, we used multinominal logistic regression to

Organization of the paper
Firstly, we present an overview of anemia and its effects on maternal and fetal health.The present study investigates the existing literature regarding the prevalence and causative factors of anemia that develop in females during pregnancy in India.In order to explicate our research methodology, we expound upon the data and methods employed in our analysis in the Materials and Methods section.The Chi-squared and gamma tests were employed to evaluate the correlation between anemia and sociodemographic and economic variables.In contrast, ordinal and multinominal logistic regression was utilized to identify the most significant determinants of anemia.In the later section, the limitations of our study and its future aspects have been discussed.In the Results section, we present the results of our analyses, which are then discussed in the Discussion section.Lastly, we draw our paper to a close in the Conclusion section, summarizing our main findings.

Data sources
The primary data for this study is obtained from the Demographic and Health Survey (DHS) dataset, which was conducted in the years 2019 to 2021.The DHS is a nationally representative survey conducted in numerous countries in the world including India.The aim of this survey is to collect key information about public health, demographics, and socio-economic factors countrywide.The data set is available on DSH's website 11 upon request and does not contain any identifiable information about the study participants.The selection of the DHS 2019-2021 dataset is intended to offer a current perspective on the factors that contribute to the severity of pregnancy anemia, in India.The DHS follows a nationally representative sampling procedure, ensuring the inclusion of diverse demographics.Additionally, DHS measures anthropometric indicators objectively and collects a variety of monitoring and impact assessment statistics.The survey also boasts a high response rate, further enhancing the reliability and validity of the gathered data.A total of 27494 respondents were included in this study after limiting the data selection to pregnant women and eliminating any missing valued samples.

Variables
In this research, we examined the anemia status of pregnant women, which was initially categorized as severe, moderate, mild, and not anemic.To enhance the statistical analysis, we merged the 'severe' and 'moderate' as 'severe to moderate' , and recategorized the target variable into three groups: severe to moderate, mild, and not anemic.We used explanatory variables at both the individual and household levels to conduct our study, such as type of residence, source of drinking water, type of toilet facility, sex of household head, wealth index, the religious affiliation of household head, educational level, geographic region, iodine level in salt, and high blood pressure, BMI, and age [12][13][14] .The variables "Wealth index" and "Educational level" have ordinal categories, while the others have nominal categories.The categories for each variable and the corresponding anemic status are provided in Table 1.Latrines with flushing to the piped sewer system, flushing to the septic tank, flushing to pit latrine, Ventilated improved pit latrine, Pit latrine with slab and Composting toilets are considered as safe toilet systems and others are unsafe 28,29 .Similarly, Piped into dwelling, Piped to the yard, Public tap or standpipe, Tube well or borehole, Protected dug well, Protected spring, Rainwater, and Bottled water are considered safe water sources.All other types of water sources are considered unsafe water sources 29,30 .

Ordinal logistic regression
Ordinal logistic regression is a statistical method used to model the relationship between an ordinal outcome variable and one or more predictor variables 31,32 .Unlike binary logistic regression, which is used for binary outcomes, ordinal logistic regression is used for outcomes that have three or more ordered categories.In this model, the probability of an outcome falling into a particular category is transformed into a logit value that is linearly related to the predictor variables.The cumulative logits function can be written as 33,34 : where P(Y ≤ j) indicates the probability of the outcome variable being less than or equal to category j, β j0 represents the intercept of category j and β j1 to β j p correspond to the coefficients associated with the predictor variables x 1 to x p , respectively.
It can be seen from equation 1 that there is a correlation between the predictor variables and the likelihood of the outcome variable falling into each category.For each category j, the coefficients represent the difference in log odds between category j and category j-1, and the model takes these differences as constant across all predictor variables.Ordinal logistic regression is used in many fields of science, such as psychology, medicine, and social sciences.Specifically, it is used where the researchers are interested in predicting outcomes that have multiple ordered categories.It is possible to use both categorical and continuous predictor variables in the model, and maximum likelihood estimation is typically used in its implementation.

Multinomial logistic regression
Multinomial Logistic Regression is a technique that helps us model and analyze the relationship, between categorical outcomes and one or more predictor variables 35,36 .It is an expansion of binary logistic regression designed to handle scenarios where the dependent variable has more than two categories.In this approach, we create logistic regression models for each category of the dependent variable comparing them to a reference category.We calculate probabilities for each category.Predict the one with the highest probability.Assuming Y represents the categorical dependent variable with K categories, and X denotes the predictor variables, the formula for regression can be expressed as follows 37,38 : where the variable P(Y = k|X) represents the likelihood of the variable Y belonging to category k.The coefficients, for each predictor variable, for category k are represented by β 0k , β 1k , β 2k , . . ., β pk .X 1 , X 2 , . . ., X p .X 1 , X 2 , . . ., X p are the predictor variables.And the total number of categories in the dependent variable is represented by K.

Chi-squared test
The chi-squared analysis is a statistical procedure used for evaluating if there is any significant relationship between two unordered categorical variables or not 39,40 .It is based on the Chi-squared test statistic, which measures the difference between the observed frequencies of the two variables and the expected frequencies under the assumption that there is no association.The Chi-squared test statistic can be mathematically formulated as 41,42 : In this case, O ij is the observed frequency in a contingency table located in ith row and jth column, E ij is the expected frequency under the assumption of no association, and the contingency table comprises r rows and c columns.
If the computed Chi-squared test statistic exceeds the critical value derived from the Chi-squared distribution with (r − 1) × (c − 1) degrees of freedom, it leads to the rejection of the null hypothesis.Accordingly, the alternative hypothesis, which indicates a substantial relationship between the variables under study, is accepted.The Chi-squared test is frequently used to determine whether there is a significant relationship between two unordered categorical variables or not.This tool is used for hypothesis testing and can be used in making decisions and policies.

Gamma test
The gamma test is a statistical technique used to determine if there is any correlation between two ordinal variables or not.Also, the degree of correlation between ordinal variables can be determined by the value of the gamma coefficient 43,44 .The gamma coefficient can be calculated as 45,46 : Among the two variables presented on the right-hand side of the equation, n discordant is the number of discord- ant observational pairs (the pairs that have different relative orders on the two variables), and n concordant is the number of concordant observational pairs (the pairs that have the same relative order on both variables).Here the denominator represents the total number of observation pairs.
The gamma coefficient ranges from the minimum value of -1 to the maximum value of 1.The value -1 indicates complete disagreement between the two variables, 1 indicates complete agreement and 0 indicates no association.The gamma test is used in a wide variety of fields such as education, psychology, and sociology to measure the relationship between ordinal variables, such as education and income, or job satisfaction and employment length.It is a useful tool for evaluating the strength of association between variables that cannot be measured on a continuous scale.
The chi-squared test is useful in comparing variables if both are equally spaced and the relationship between them is linear.However, this assumption is not satisfied if the data are ordinal in nature.The gamma test does not rely on these assumptions and it can be used to measure the strength of association between ordinal variables that are not linearly related to each other.Therefore, the gamma test may be more appropriate than the chi-squared test for analyzing the relationship between two ordinal variables.

Bivariate analyses
Bivariate analysis is one of the simplest statistical approaches for finding an association between two variables.Chi-square and gamma tests were utilized to analyze the relationship between variables and can provide valuable insights into their association.Table 2 indicates that except "Result of salt test", "High BP" and "Source of drinking water", all other variables are statistically significant.Among them, the "Highest educational level", "Wealth index" and "Geographic region" are highly associated.Furthermore, Indian women with improved toilet facilities have a lower risk of maternal anemia.

Multivariate analysis
multivariate analysis allows for the analysis of the relationship between a categorical dependent variable and multiple independent variables.As discussed, both ordinal and multinominal logistic regression is used for multivariate analysis.Features such as the source of drinking water, High BP, and Salt test results were suggested as insignificant by both bivariate and multivariate analyses.Hence, we did not discuss these features in these sections.Several noteworthy observations were made during these analyses, which are briefly discussed in the following sections.

Ordinal logistic regression
Ordinal logistic regression studies the ordered nature of the target variable, that is the severity of anemia.Table 3 shows the findings of ordinal logistic regression.This analysis found relationships between several critical factors and anemia in detail.The observational outcome is highlighted below: • Pregnancy-associated anemia is more common in urban India than rural areas.Urban pregnant Indians are 9.4% less susceptible to severe to moderate anemia than their rural counterparts(OR = 1.094, p = 0.007).• Socioeconomic status significantly influences the prevalence of pregnancy-associated anemia in India.Where the poorest individuals have 34.2% more likelihood of being affected by severe to moderate anemia than the richest ones (OR = 0.658, p< 0.001), the prevalence of it reduces as the wealth index improves to poorer(OR = 0.768, p< 0.001), middle(OR = 0.833, p< 0.001) and richer(OR = 0.846, p< 0.001).• Educated women are observed to have more resistance to the prevalence of anemia.When considering higher education as the reference level, pregnant women with minimum or no education were 37%(OR = 0.621, p< 0.001), with primary education were 32% (OR = 0.675, p< 0.001) and those with secondary education were 22.2% (OR = 0.778, p< 0.001) more vulnerable to being affected by anemia.It is observed that the higher the education level of the individual, the lower the odds of being anemic.• Type of toilet facility is another concerning factor of anemia in Indian pregnants.Women using improved toilet facilities have a 7.5% less probability of being severe to moderately anemic (OR = 1.075, p = 0.032).• For analyzing anemia prevalence across the geographic regions, Southern regions of India have been consid- ered as the reference category.The Northeastern and Northern regions have a comparatively lower likelihood of severe to moderate anemia (OR = 1.359, p< 0.001; OR = 1.130, p = 0.003), whereas Eastern parts of India have 17.4% more prevalence of anemia than the southern parts (OR = 0.826 p< 0.001).• The spread of anemia is different among religious groups.Christian pregnants have the lowest odds of being anemic, that is 31.2%(OR = 1.688.p< 0.001), and Muslim pregnant women have a 74.5% (OR = 1.255, p< 0.001) probability of being anemic when compared with the baseline group.
• Underweight pregnant mothers are found to have the highest likelihood of anemia (OR = 0.993, p = 0.914).
Moreover, the odds of being anemic decrease as the BMI increases when the obese group is considered as the reference category.Pregnant women of healthy weight have 3.1% (OR = 1.031, p = 0.618), and overweight pregnant women have 6.6% (OR = 1.066, p = 0.328) fewer odds of being anemic respectively.• Age of the pregnant women is another concerning factor of pregnancy-associated anemia.While teenage pregnants have the highest probability of being anemic (OR = 0.951, p = 0.742), the rate decreases as the age of pregnant women increases.

Multinominal logistic regression
Multinominal logistic regression considers the target variable as having a set of unordered categories defining different degrees of anemia.Table 4 summarizes the outcome of multivariate logistic regression, which illustrates the relationship between the estimated parameters of the covariates used.The key findings are pointed out below: www.nature.com/scientificreports/ • Table 4 reveals a statistically significant difference in anemia prevalence between rural and urban pregnant women, with urban women exhibiting 9.5% lower odds of being severe to moderately anemic (OR = 0.905, p = 0.021) and 9.1% lower odds of being mild anemic (OR = 0.909, p = 0.029) compared to rural Indian pregnants.• The type of family structure (patriarchal or matriarchal) did not have any significant impact on the preva- lence of severe to moderate anemia, but it has an influence on the prevalence of mild anemia.Indian women www.nature.com/scientificreports/Mild anemia is also 33.3% and 17.9% more prevalent in the poorest and poorer pregnant women compared to the richest group.• Educational status also has a significant impact on anemia prevalence.Compared to educated pregnant, it is found that lower levels of educational qualification are associated with higher odds of being anemic for both severe to moderate(OR = 1.646, p< 0.001; OR = 1.376, p < 0.001 ) and mild anemic (OR = 1.272, p< 0.001; OR = 1.182, p< 0.001) cases while considering "higher" as the baseline category.The absence of formal education ("no education", OR = 1.836, p< 0.001; OR = 1.316, p< 0.001) is correlated with an increased probability of both severe to moderate and mild anemia.It is observed that increased levels of education decrease the likelihood of anemia.• The type of toilet system used also plays an important role in anemia prevalence.The pregnants using safer sanitation systems are found to be 9.2% less vulnerable to severe to moderate anemia than the users of no proper toilet facilities (OR = 0.908, p = 0.024).• The prevalence of anemia varies across geographical locations.Northeastern regions have 30.4% (OR = 0.696, p< 0.001) and 21.2% (OR = 0.788, p = 0.001) less likelihood of severe to moderate and mild maternal anemia when considering the Southern region as the reference category.In Northern states of India, the probability of severe to moderate anemia is 11.2% (OR = 0.888, p = 0.027) and mild anemia is 16.4% (0.836, p = 0.001) lower than in Southern India.On the other hand, pregnant women residing in eastern India are 26.8% more prone to severe to moderate anemia (OR = 1.268, p< 0.001) and 29.6% more prone to mild anemia (OR = 1.296, p< 0.001) compared to the baseline category.
• Investigating the influence of religious affiliation on anemia severity, the findings exposed intriguing patterns.
Muslim pregnant women exhibited a reduced likelihood of 26.5% severe to moderate anemia(OR = 0.735, p< 0.001).Conversely, Christian women displayed a 48.1% lower probability for severe to moderate anemia (OR = 0.519, p< 0.001), and a 27.3% lower probability of mild anemia (OR = 0.727, p = 0.001) with respect to "other/no religion" group as reference.• Maternal anemia is associated with the age of the individuals as well.Especially in the case of mild anemia, teen Indian mothers are 67.6% more vulnerable to mild anemia (OR = 1.676, p = 0.027) than pregnants in/ above their forties.Also, pregnants in their twenties have 56.7% more likelihood of experiencing mild anemia (OR = 1.567, p = 0.51)than pregnant individuals aged forty and above.• The probability of being both severe to moderate and mild anemic is more pronounced in pregnants with lower BMI.As the BMI increases, their likelihood of being anemic decreases.The 'obese' group has been considered as the reference category.

Discussion
The study identifies several important factors that increase the probability of pregnancy anemia in India.The study indicates that the prevalence of anemia during pregnancy is possible to mitigate by designing a targeted scheme based on the analysis of data about various variables such as place of residency location, the origin of potable water, type of toilet structure, sex of the head of the family, wealth index, the religious affiliation of the head of household, educational level, geographic region, iodine level in salt, and high blood pressure.
The findings of this study indicate that pregnant women who reside in rural areas exhibit a higher likelihood of developing anemia than their urban counterparts.The results of the study are similar to those of several other studies 47,48 .The pace of modernization and industrialization in rural areas is slower than in urban areas.Health services are often unavailable to rural communities, and pregnant women are mostly unaware of proper nutrition, which could contribute to higher rates of anemia.In order to reduce the prevalence of anemia in India, effective interventions and government programs should be designed based on the disparities in socioeconomic, health, and nutrition factors between rural and urban areas.
This study clearly indicates that education plays a vital role in preventing pregnancy-associated anemia.One who receives education possesses better nutrition knowledge and a better understanding of pregnancy factors compared to someone who is not privileged to have access to education.By being aware and knowledgeable, educated people can better take care of themselves or other women during pregnancy and reduce their risk of developing anemia 49,27 .The literacy rate among women is one of the most crucial underlying factors of the prevalence of anemia.An educated woman typically has more access to healthcare services and facilities, which helps reduce the risk of anemia.With the increasing level of education, women become more equipped with knowledge of the health risks associated with anemia.Consequently, the more the women are educated the better they are able to prevent or manage the condition, reducing its chances of occurring.
This study revealed that, in India, pregnancy-associated anemia and affiliation to religious groups have a significant association, which resonates with several other studies 50,51 .According to the findings of our research, pregnant belonging to Hindu communities suffer more from anemia compared to other religious groups.Pregnancy-associated anemia was less prominent in the Christian community.This variation in the prevalence of anemia between different religious groups may be because of the differences in their dietary habits and lifestyles.
A significant association was found between the socioeconomic status of pregnant women and their risk of anemia.Pregnant women from low-income households in India are more susceptible to anemia than their high-income counterparts.Earlier research studies also testify the same findings 27,52 .This could be due to a lack of access to quality healthcare services.Poorer families may also have limited access to a wide range of foods, which can result in poor nutrition and anemia.Therefore, there is a crucial need for targeted efforts to address the issue of anemia in pregnant women from low-income families.This research found an inverse relationship between the availability of toilet facilities and the occurrence of pregnancy-associated anemia.Not only this research, but some other studies also support the same findings 53,54 .If empirically analyzed, possible explanations for this relationship can be found.Improved toilet facilities help to reduce many parasitic infections, which are common causes of anemia, for example, hookworm.Additionally, improved sanitation facilities can promote better hygiene practices, such as handwashing, which can prevent other infectious diseases that may later cause anemia.The findings indicate how ensuring basic needs such as access to safe and hygienic sanitation facilities can have a greater influence on improving maternal health.It also suggests that investing in sanitation infrastructure could have a significant impact on reducing the burden of anemia among pregnant women.
As observed from the findings, the prevalence of anemia is not the same across all the regions in India, demonstrating the importance of dedicated interventions to tackle the problem.In this case, also, various other studies also validate the same finding 55,56 .eastern regions of India showed high rates of pregnancy-associated anemia, which is quite alarming and requires immediate attention to control and prevent it.Such preventive measures may include increasing access to iron and folic acid supplements, nutrition education, and improving healthcare services for pregnant women in these geographic regions.It is encouraging to note that northern and western India have a significantly lower prevalence of pregnancy-associated anemia compared to other regions.There are plenty of reasons behind this, including better access to healthcare services and nutrition, higher levels of education, etc. in these regions.It can set a valuable example for other Indian geographic regions in implementing effective interventions aimed at reducing maternal anemia.
Prevalence of maternal anemia in India is not even for all age groups, rather specific age groups are more affected.Particularly, mild anemia is more prevalent in teenage and young adult Indian mothers.This finding is evident in several other studies as well 57,58 .It may be due to underlying factors such as inadequate nutrition, limited prenatal care, socioeconomic disparities, and early pregnancies.On the other hand, pregnant women aged over 40 are more susceptible to severe anemia, probably due to reduced ability of nutrient absorption, menopausal changes, chronic illnesses, dietary patterns, and so on.
Resonating with several other studies 59,47 , the probability of pregnancy-associated anemia is higher in underweight women.Lower BMI is associated with malnutrition, indicating inadequate nutritional reserves in the body, such as iron, vitamin B12, and folic acid which are essential for the production of red blood cells.Conversely, individuals with a healthy BMI are more likely to have a balanced and nutritious diet, reducing the risk of nutritional deficiencies leading to anemia.
This research leads to an unexpected finding, that is the absence of a significant impact of iodized salt on pregnancy-associated anemia.Iodine deficiency is a well-known cause of goiter, intellectual impairment, and cretinism.Pregnant women require higher amounts of iodine to ensure proper development of the fetal brain, and therefore iodine supplementation is crucial during pregnancy.Our study suggests that, although iodized salt may help to prevent iodine deficiency, it may not be significantly impactful to the prevention of anemia in pregnant women.However, this result contradicts some of the existing studies 56,60 .

Limitation and future work
There are some limitations to our study of pregnancy-related anemia in India.Firstly, our reliance on the DHS 2019-2021 dataset restricts our findings to a timeframe, which means we might miss out on some recent changes in anemia prevalence.Moreover, the cross-sectional nature of the data limits our ability to establish causal relationships highlighting the need for longitudinal studies to understand how these dynamics evolve over time.Additionally, the variables available in the dataset limit the scope of our analysis causing some other influential factors of anemia to be left out.It is crucial to include variables, than what the DHS dataset provides in order to encompass all the factors involved.Additionally investigating the efficacy of interventions aimed at addressing these factors could lead to some promising strategies to minimize the occurrence of pregnancy-related anemia.India, with its range of geography, culture, and economy, is a country of diversity.Therefore it would be more beneficial to concentrate on research that explores estimations at the micro level such as districts and PSUs, in order to gain insights for formulating more targeted policy recommendations.

Conclusion
This investigation aimed to identify the underlying determinants of anemia among pregnant females in India by utilizing information from the Demographic and Health Survey from 2019-21 (India).More than half of the pregnant females in our research sample had anemia, indicating a significant prevalence of anemia in India.Our analysis identified several essential parameters related to suffering from anemia by pregnant females, including the highest educational level, wealth index, residing geographical region, type of toilet structure, the religious affiliation of the household head, urbanicity of the residence, age, BMI, and family structure.From the observations of bivariate and multivariate analyses, the highest educational level, wealth index, and geographical region were identified as the most crucial elements that influence anemia among pregnant women in India.These findings have important implications for policymakers and healthcare providers in India.Interventions targeted toward enhancing antenatal care services can potentially alleviate the prevalence of anemia among pregnant females.Additionally, efforts to improve education and economic opportunities for women may also have a positive impact on the prevention and control of anemia.Overall, this study emphasizes the need for targeted interventions to mitigate the alarming prevalence of anemia among pregnant females in India.

Data availibility
The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.

Table 1 .
Description of the dataset.

Table 2 .
Significance of the variable using Chi-square and Gamma test.Significant values are in [bold].

Table 3 .
Estimations of the parameters of correlated predictor variables using Ordinal Logistic Regression model.Significant values are in [bold].

Table 4 .
Estimations of the parameters of correlated predictor variables using Multinominal Logistic Regression model.Significant values are in [bold].